Acceleration of Large Transformer Model Training by Sensitivity-Based Layer Dropping
نویسندگان
چکیده
Transformer models are widely used in AI applications such as Natural Language Processing (NLP), Computer Vision (CV), etc. However, enormous computation workload be-comes an obstacle to train large transformer efficiently. Recently, some methods focus on reducing the during training by skipping layers. How-ever, these use simple probability distribution and coarse-grained calculation, which significantly affect model accuracy. To address issue, this paper we propose a novel method accelerate training—Sensitivity-Based Layer Dropping (SBLD). SBLD uses lay-er-wise sensitivity data switch on/off layers proper order keep high Besides, adjust of with scheduler speed get faster convergence. Our results show that solves accuracy drop issue com-pared prior layer dropping methods. can decrease end-to-end time 19.67% GPT-3 Medium model, same increasing 1.65% w.r.t. baseline. Furthermore, for SwinV2-L obtained Top-1 Top-5 accuracies also higher vs. Thus, proposed is efficient practical improve training.
منابع مشابه
Training Tips for the Transformer Model
This article describes our experiments in neural machine translation using the recent Tensor2Tensor framework and the Transformer sequence-to-sequencemodel (Vaswani et al., 2017). We examine some of the critical parameters that affect the final translation quality, memory usage, training stability and training time, concluding each experiment with a set of recommendations for fellow researchers...
متن کاملVarinace-Based Sensitivity Analysis of Deterministic Model
The study of many scientific and natural phenomena in laboratory condition is sometimes impossible, therefore theire expresed by mathemathical models and simulated by complex computer models (codes). Running a computer model with different inputs is called a computer expriment. Statistical issues allocated a wide range of applications for computer expriment to itself. In this paper, ...
متن کاملUltra-compact Piezoelectric Transformer Charged Particle Acceleration
The high voltage capability, compact size, and high output impedance of piezoelectric transformers makes them very attractive primary high voltage supplies for charged particle beam applications. This paper will present and contrast recent results utilizing piezoelectric transformers to accelerate electrons and ions, to produce x-rays and neutrons for imaging and active interrogation and to pro...
متن کاملa study on insurer solvency by panel data model: the case of iranian insurance market
the aim of this thesis is an approach for assessing insurer’s solvency for iranian insurance companies. we use of economic data with both time series and cross-sectional variation, thus by using the panel data model will survey the insurer solvency.
mortality forecasting based on lee-carter model
over the past decades a number of approaches have been applied for forecasting mortality. in 1992, a new method for long-run forecast of the level and age pattern of mortality was published by lee and carter. this method was welcomed by many authors so it was extended through a wider class of generalized, parametric and nonlinear model. this model represents one of the most influential recent d...
15 صفحه اولذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence
سال: 2023
ISSN: ['2159-5399', '2374-3468']
DOI: https://doi.org/10.1609/aaai.v37i9.26321